Text-based games present a unique class of sequential decision making problem in which agents interact with a partially observable, simulated environment via actions and observations conveyed through natural language. Such observations typically include instructions that, in a reinforcement learning (RL) setting, can directly or indirectly guide a player towards completing reward-worthy tasks. In this work, we study the ability of RL agents to follow such instructions. We conduct experiments that show that the performance of state-of-the-art text-based game agents is largely unaffected by the presence or absence of such instructions, and that these agents are typically unable to execute tasks to completion. To further study and address the task of instruction following, we equip RL agents with an internal structured representation of natural language instructions in the form of Linear Temporal Logic (LTL), a formal language that is increasingly used for temporally extended reward specification in RL. Our framework both supports and highlights the benefit of understanding the temporal semantics of instructions and in measuring progress towards achievement of such a temporally extended behaviour. Experiments with 500+ games in TextWorld demonstrate the superior performance of our approach.
translated by 谷歌翻译
Recently, methods such as Decision Transformer that reduce reinforcement learning to a prediction task and solve it via supervised learning (RvS) have become popular due to their simplicity, robustness to hyperparameters, and strong overall performance on offline RL tasks. However, simply conditioning a probabilistic model on a desired return and taking the predicted action can fail dramatically in stochastic environments since trajectories that result in a return may have only achieved that return due to luck. In this work, we describe the limitations of RvS approaches in stochastic environments and propose a solution. Rather than simply conditioning on the return of a single trajectory as is standard practice, our proposed method, ESPER, learns to cluster trajectories and conditions on average cluster returns, which are independent from environment stochasticity. Doing so allows ESPER to achieve strong alignment between target return and expected performance in real environments. We demonstrate this in several challenging stochastic offline-RL tasks including the challenging puzzle game 2048, and Connect Four playing against a stochastic opponent. In all tested domains, ESPER achieves significantly better alignment between the target return and achieved return than simply conditioning on returns. ESPER also achieves higher maximum performance than even the value-based baselines.
translated by 谷歌翻译
强化学习(RL)是人工智能中的核心问题。这个问题包括定义可以通过与环境交互学习最佳行为的人工代理 - 其中,在代理试图最大化的奖励信号的奖励信号中定义最佳行为。奖励机(RMS)提供了一种基于Automate的基于自动机的表示,该奖励功能使RL代理能够将RL问题分解为可以通过禁止策略学习有效地学习的结构化子问题。在这里,我们表明可以从经验中学习RMS,而不是由用户指定,并且可以使用所产生的问题分解来有效地解决部分可观察的RL问题。我们将学习RMS的任务作为离散优化问题构成,其中目标是找到将问题分解为一组子问题的RM,使得其最佳记忆策略的组合是原始问题的最佳策略。我们展示了这种方法在三个部分可观察的域中的有效性,在那里它显着优于A3C,PPO和宏碁,并讨论其优点,限制和更广泛的潜力。
translated by 谷歌翻译
在各种领域,包括搜索和救援,自动驾驶汽车导航和侦察的各个领域,形成不断变化的场景的非线图像(NLOS)图像的能力可能具有变革性。大多数现有的活性NLOS方法使用针对继电器表面并收集回返回光的时间分辨测量的脉冲激光来照亮隐藏场景。流行的方法包括对垂直壁上的矩形网格的栅格扫描,相对于感兴趣的数量,以产生共聚焦测量集合。这些固有地受到激光扫描的需求的限制。避免激光扫描的方法将隐藏场景的运动部件作为一个或两个点目标。在这项工作中,基于更完整的光学响应建模,但仍没有多个照明位置,我们演示了运动中对象的准确重建和背后的固定风景的“地图”。计数,本地化和表征运动中隐藏物体的大小,结合固定隐藏场景的映射的能力,可以大大提高各种应用中的室内情况意识。
translated by 谷歌翻译
概念漂移过程挖掘(PM)是一种挑战,因为古典方法假设进程处于稳态,即事件共享相同的进程版本。我们对这些领域的交叉点进行了系统的文献综述,从而审查了过程采矿中的概念漂移,并提出了用于漂移检测和在线流程挖掘的现有技术的分类,以实现不断发展的环境。现有的作品描绘了(i)PM仍然主要关注离线分析,并且(ii)由于缺乏公共评估协议,数据集和指标,过程中的概念漂移技术的评估是麻烦的。
translated by 谷歌翻译
对攻击者创建的对手示例或几乎无法区分的输入,显着降低机器学习精度。理论证据表明,数据集的高固有量维度有助于对侵犯在分类模型中开发有效的对抗性实例的能力。相邻地,对学习模型的数据呈现会影响其性能。例如,我们已经通过维度减少技术来看,用于帮助在机器学习应用中的特征的泛化辅助。因此,数据转换技术在诸如智能医疗或军事系统的决策应用中具有最先进的学习模型。通过这项工作,我们探讨了数据变换技术如何,例如特征选择,维度减少或趋势提取技术可能会影响侵犯经常性神经网络在经常性神经网络上创造有效的对抗性样本的能力。具体而言,从数据歧管的角度分析它和其内在特征的呈现。我们的评估经验证明了特征选择和趋势提取技术可能会增加RNN的漏洞。数据变换技术仅在近似数据集的内在尺寸,最小化CODIMENUSE的情况下才会减少对普发阿的漏洞的漏洞,并保持更高的歧管覆盖。
translated by 谷歌翻译